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Abstract 

Stochastic And-Or grammars (AOG) extend traditional stochastic gram¬ 
mars of language to model other types of data such as images and events. 
In this paper we propose a representation framework of stochastic AOGs 
that is agnostic to the type of the data being modeled and thus unifies 
various domain-specific AOGs. Many existing grammar formalisms and 
probabilistic models in natural language processing, computer vision, and 
machine learning can be seen as special cases of this framework. We also 
propose a domain-independent inference algorithm of stochastic context- 
free AOGs and show its tractability under a reasonable assumption. Fur¬ 
thermore, we provide two interpretations of stochastic context-free AOGs 
as a subset of probabilistic logic, which connects stochastic AOGs to the 
field of statistical relational learning and clarifies their relation with a few 
existing statistical relational models. 


1 Introduction 

Formal grammars are a popular class of knowledge representation that is tradi¬ 
tionally confined to the modeling of natural and computer languages. However, 
several extensions of grammars have been proposed over time to model other 
types of data such as images in m n and events HUSIE]. One prominent type 
of extension is stochastic And-Or grammars (AOG) [2j. A stochastic AOG si¬ 
multaneously models compositions (i.e., a large pattern is the composition of 
several small patterns arranged according to a certain configuration) and re¬ 
configurations (i.e., a pattern may have several alternative configurations), and 
in this way it can compactly represent a probabilistic distribution over a large 
number of patterns. Stochastic AOGs can be used to parse data samples into 
their compositional structures, which help solve multiple tasks (such as classifi¬ 
cation, annotation, and segmentation of the data samples) in a unified manner. 

‘This work was supported by the National Natural Science Foundation of China 
(61503248). 
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In this paper we will focus on the context-free subclass of stochastic AOGs, 
which serves as the skeleton in building more advanced stochastic AOGs. 

Several variants of stochastic AOGs and their inference algorithms have been 
proposed in the literature to model different types of data and solve different 
problems, such as image scene parsing [7] and video event parsing [Bj. Our first 
contribution in this paper is that we provide a unified representation framework 
of stochastic AOGs that is agnostic to the type of the data being modeled; 
in addition, based on this framework we propose a domain-independent infer¬ 
ence algorithm that is tractable under a reasonable assumption. The benefits 
of a unified framework of stochastic AOGs include the following. First, such 
a framework can help us generalize and improve existing ad hoc approaches 
for modeling, inference and learning with stochastic AOGs. Second, it also 
facilitates applications of stochastic AOGs to novel data types and problems 
and enables the research of general-purpose inference and learning algorithms 
of stochastic AOGs. Further, a formal definition of stochastic AOGs as abstract 
probabilistic models makes it easier to theoretically examine their relation with 
other models such as constraint-based grammar formalism [S] and sum-product 
networks [2- In fact, we will show that many of these related models can be 
seen as special cases of stochastic AOGs. 

Stochastic AOGs model compositional structures based on the relations be¬ 
tween sub-patterns. Such probabilistic modeling of relational structures is tra¬ 
ditionally studied in the field of statistical relational learning m- Our second 
contribution is that we provide probabilistic logic interpretations of the unified 
representation framework of stochastic AOGs and thus show that stochastic 
AOGs can be seen as a novel type of statistical relational models. The logic 
interpretations help clarify the relation between stochastic AOGs and a few 
existing statistical relational models and probabilistic logics that share certain 
features with stochastic AOGs (e.g., tractable Markov logic m and stochastic 
logic programs |12jl. It may also facilitate the incorporation of ideas from sta¬ 
tistical relational learning into the study of stochastic AOGs and at the same 
time contribute to the research of novel (tractable) statistical relational models. 

2 Stochastic And-Or Grammars 

An AOG is an extension of a constituency grammar used in natural language 
parsing m■ Similar to a constituency grammar, an AOG defines a set of valid 
hierarchical compositions of atomic entities. However, an AOG differs from a 
constituency grammar in that it allows atomic entities other than words and 
compositional relations other than string concatenation. A stochastic AOG 
models the uncertainty in the composition by defining a probabilistic distribu¬ 
tion over the set of valid compositions. 

Stochastic AOGs were first proposed to model images n hi in ng , in par¬ 
ticular the spatial composition of objects and scenes from atomic visual words 
(e.g., Garbor bases). They were later extended to model events, in particular 
the temporal and causal composition of events from atomic actions [6] and flu- 
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ents m- More recently, these two types of AOGs were used jointly to model 
objects, scenes and events from the simultaneous input of video and text Ell¬ 
in each of the previous work using stochastic AOGs, a different type of 
data is modeled with domain-specific and problem-specific definitions of atomic 
entities and compositions. Tu et al. [ITS] provided a first attempt towards a more 
unified definition of stochastic AOGs that is agnostic to the type of the data 
being modeled. We refine and extend their work by introducing parameterized 
patterns and relations in the unified definition, which allows us to reduce a 


wide range of related models to AOGs (as will be discussed in section 2.1). 
Based on the unified framework of stochastic AOGs, we also propose a domain- 
independent inference algorithm and study its tractability (section |2.2[ ) . Below 
we start with the definition of stochastic context-free AOGs, which are the most 
basic form of stochastic AOGs and are used as the skeleton in building more 
advanced stochastic AOGs. 

A stochastic context-free AOG is defined as a 5-tuple (E, N, S,9, R): 

E is a set of terminal nodes representing atomic patterns that are not decom¬ 
posable; 

N is a set of nonterminal nodes representing high-level patterns, which is di¬ 
vided into two disjoint sets: And-nodes and Or-nodes; 


S £ N is a start symbol that represents a complete pattern; 


9 is a function that maps an instance of a terminal or nonterminal node x to 
a parameter 9 X (the parameter can take any form such as a vector or a 
complex data structure; denote the maximal parameter size by mg); 

R is a set of grammar rules, each of which takes the form of x —> C representing 
the generation from a nonterminal node a; to a set C of nonterminal or 
terminal nodes (we say that the rule is “headed” by node x and the nodes 
in C are the “child nodes” of x). 

The set of rules R is further divided into two disjoint sets: And-rules and Or- 
rules. 


• An And-rule, parameterized by a triple (r, f, /), represents the decompo¬ 
sition of a pattern into a configuration of non-overlapping sub-patterns. 
The And-rule specifies a production r : A —> {aq, X 2 , ■ ■ ■, x n j for some 
n > 2, where A is an And-node and X\, x%, ..., x n are a set of terminal or 
nonterminal nodes representing the sub-patterns. A relation between the 
parameters of the child nodes, t(9 xl ,9 X2 ,..., 9 Xn ), specifies valid configu¬ 
rations of the sub-patterns. This so-called parameter relation is typically 
factorized to the conjunction of a set of binary relations. A parameter 
function f is also associated with the And-rule specifying how the param¬ 
eter of the And-node A is related to the parameters of the child nodes: 
9a = f{9 Xll 9 X2 ,... ,9 Xn ). We require that both the parameter relation 
and the parameter function take time polynomial in n and mg to com¬ 
pute. There is exactly one And-rule that is headed by each And-node. 
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• An Or-rule, parameterized by an ordered pair (r,p), represents an al¬ 
ternative configuration of a pattern. The Or-rule specifies a production 
r : O —> x, where O is an Or-node and x is either a terminal or a nontermi¬ 
nal node representing a possible configuration. A conditional probability 
p is associated with the Or-rule specifying how likely the configuration 
represented by x is selected given the Or-node O. The only constraint in 
the Or-rule is that the parameters of O and x must be the same: do = 0 X . 
There typically exist multiple Or-rules headed by the same Or-node, and 
together they can be written as O —> xi|x 2 | ■ • ■ \x n . 

Note that unlike in some previous work, in the definition above we assume 
deterministic And-rules for simplicity. In principle, any uncertainty in an And- 
rule can be equivalently represented by a set of Or-rules each invoking a different 
copy of the And-rule. 

Fig. 0 a) shows an example stochastic context-free AOG of line drawings. 
Each terminal or nonterminal node represents an image patch and its parameter 
is a 2D vector representing the position of the patch in the image. Each terminal 
node denotes a line segment of a specific orientation while each nonterminal node 
denotes a class of line drawing patterns. The start symbol S denotes a class 
of line drawing images (e.g., images of animal faces). In each And-rule, the 
parameter relation specifies the relative positions between the sub-patterns and 
the parameter function specifies the relative positions between the composite 
pattern and the sub-patterns. 

With a stochastic context-free AOG, one can generate a compositional struc¬ 
ture by starting from a data sample containing only the start symbol S and re¬ 
cursively applying the grammar rules in R to convert nonterminal nodes in the 
data sample until the data sample contains only terminal nodes. The resulting 
compositional structure is a tree in which the root node is S, each non-leaf node 
is a nonterminal node, and each leaf node is a terminal node; in addition, for 
each appearance of And-node A in the tree, its set of child nodes in the tree 
conforms to the And-rule headed by A, and for each appearance of Or-node O 
in the tree, it has exactly one child node in the tree which conforms to one of 
the Or-rules headed by O. The probability of the compositional structure is the 
product of the probabilities of all the Or-rules used in the generation process. 
Fig. |T])b) shows an image and its compositional structure generated from the 
example AOG in Fig. [l|a). Given a data sample consisting of only atomic pat¬ 
terns, one can also infer its compositional structure by parsing the data sample 
with the stochastic context-free AOG. We will discuss the parsing algorithm 
later. 

Our framework is flexible in that it allows different types of patterns and re¬ 
lations within the same grammar. Consider for example a stochastic AOG mod¬ 
eling visually grounded events (e.g., videos of people using vending-machines). 
We would have two types of terminal or nonterminal nodes that model events 
and objects respectively. An event node represents a class of events or sub¬ 
events, whose parameter is the start/end time of an instance event. An object 
node represents a class of objects or sub-objects (possibly in a specific state or 
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(a) 



(b) 


Figure 1: (a) A graphical representation of an example stochastic AOG of line 
drawings of animal faces. Each And-rule is represented by an And-node and 
all of its child nodes in the graph. The spatial relations within each And-rule 
are not shown for clarity. Each Or-rule is represented by an Or-node and one 
of its child nodes, with its probability shown on the corresponding edge, (b) A 
line drawing image and its compositional structure generated from the example 
AOG. Again, the spatial relations between nodes are not shown for clarity. The 
probability of the compositional structure is partially computed at the top right. 


posture), whose parameter contains both the spatial information and the time 
interval information of an instance object. We specify temporal relations be¬ 
tween event nodes to model the composition of an event from sub-events; we 
specify spatial relations between object nodes to model the composition of an 
object from its component sub-objects as well as the composition of an atomic 
event from its participant objects; we also specify temporal relations between 
related object nodes to enforce the alignment of their time intervals. 

Note that different nonterminal nodes in an AOG may share child nodes. 
For example, in Figjl] each terminal node representing a line segment may ac¬ 
tually be shared by multiple parent nonterminal nodes representing different 
line drawing patterns. Furthermore, there could be recursive rules in an AOG, 
which means the direct or indirect production of a grammar rule may contain 
its left-hand side nonterminal. Recursive rules are useful in modeling languages 
and repetitive patterns. 

In some previous work, stochastic AOGs more expressive than stochastic 
context-free AOGs are employed. A typical augmentation over context-free 
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AOGs is that, while in a context-free AOG a parameter relation can only be 
specified within an And-rule, in more advanced AOGs parameter relations can 
be specified between any two nodes in the grammar. This can be very useful in 
certain scenarios. For example, in an image AOG of indoor scenes, relations can 
be added between all pairs of 2D faces to discourage overlap |7j. However, such 
relations make inference much more difficult. Another constraint in context- 
free AOGs that is sometimes removed in more advanced AOGs is the non¬ 
overlapping requirement between sub-patterns in an And-rule. For example, in 
an image AOG it may be more convenient to decompose a 3D cube into 2D 
faces that share edges 7j. We will leave the formal definition and analysis of 
stochastic AOGs beyond context-freeness to future work. 

2.1 Related Models and Special Cases 

Stochastic context-free AOGs subsume many existing models as special cases. 
Because of space limitation, here we informally describe these related models 
and their reduction to AOGs and leave the formal definitions and proofs in 
|Appcndix A| 

Stochastic context-free grammars (SCFG) are clearly a special case of stochas¬ 
tic context-free AOGs. Any SCFG can be converted into an And-Or normal 
form that matches the structure of a stochastic AOG m- In a stochastic AOG 
representing a SCFG, each node represents a string and the parameter of a node 
is the start/end positions of the string in the complete sentence; the parameter 
relation and parameter function in an And-rule specify string concatenation, 
i.e., the substrings must be adjacent and the concatenation of all the substrings 
forms the composite string represented by the parent And-node. 

There have been a variety of grammar formalisms developed in the natural 
language processing community that go beyond the concatenation relation of 
strings. For examples, in some formalisms the substrings are interwoven to 
form the composite string EDI HU- More generally, in a grammar rule a linear 
regular string function can be used to combine lists of substrings into a list of 
composite strings, as in a linear context-free rewriting system (LCFRS) |22| . All 
these grammar formalisms can be represented by context-free AOGs with each 
node representing a list of strings, the node parameter being a list of start/end 
positions, and in each And-rule the parameter relation and parameter function 
defining a linear regular string function. Since LCFRSs are known to generate 
the larger class of mildly context-sensitive languages, context-free AOGs when 
instantiated to model languages can be at least as expressive as mildly context- 
sensitive grammars. 

Constraint-based grammar formalisms [S] are another class of natural lan¬ 
guage grammars, which associate so-called feature structures to nonterminals 
and use them to specify constraints in the grammar rules. Such constraints can 
help model natural language phenomena such as English subject-verb agree¬ 
ment and underlie grammatical theories such as head-driven phrase structure 
grammars |23| . It is straightforward to show that constraint-based grammar 
formalisms are also special cases of context-free AOGs (with a slight general- 
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ization to allow unary And-rules), by establishing equivalence between feature 
structures and node parameters and between constraints and parameter rela¬ 
tions/functions. 

In computer vision and pattern recognition, stochastic AOGs have been ap¬ 
plied to a variety of tasks as discussed in the previous section. In addition, 
several other popular models, such as the deformable part model [23] and the 
flexible mixture-of-parts model [25], can essentially be seen as special cases of 
stochastic context-free AOGs in which the node parameters encode spatial infor¬ 
mation of image patches and the parameter relations/functions encode spatial 
relations between the patches. 

Sum-product networks (SPN) [5] are a new type of deep probabilistic mod¬ 
els that extend the ideas of arithmetic circuits [2B] and AND/OR search spaces 
[ 271 and can compactly represent many probabilistic distributions that tradi¬ 
tional graphical models cannot tractably handle. It can be shown that any 
decomposable SPN has an equivalent stochastic context-free AOG: Or-nodes 
and And-nodes of the AOG can be used to represent sum nodes and product 
nodes in the SPN respectively, all the node parameters are set to null, param¬ 
eter relations always return true, and parameter functions always return null. 
Because of this reduction, all the models that can reduce to decomposable SPNs 
can also be seen as special cases of stochastic context-free AOGs, such as thin 
junction trees [25], mixtures of trees [29] and latent tree models m- 


2.2 Inference 


The main inference problem associated with stochastic AOGs is parsing, i.e., 
given a data sample consisting of only terminal nodes, infer its most likely com¬ 
positional structure (parse). A related inference problem is to compute the 
marginal probability of a data sample. It can be shown that both problems are 
NP-hard (see Appendix B for the proofs). Nevertheless, here we propose an ex¬ 
act inference algorithm for stochastic context-free AOGs that is tractable under 
a reasonable assumption on the number of valid compositions in a data sample. 
Our algorithm is based on bottom-up dynamic programming and can be seen 
as a generalization of several previous exact inference algorithms designed for 
special cases of stochastic AOGs (such as the CYK algorithm for text parsing). 

Algorithm [l] shows the inference algorithm that returns the probability of 
the most likely parse. After the algorithm terminates, the most likely parse can 
be constructed by recursively backtracking the selected Or-rules from the start 
symbol to the terminals. To compute the marginal probability of a data sample, 
we simply replace the max operation with sum in line [20] of Algorithm [Tj 

In Algorithm [T] we assume the input AOG is in a generalized version of 
Chomsky normal form, i.e., (1) each And-node has exactly two child nodes 
which must be Or-nodes, (2) the child nodes of Or-nodes must not be Or-nodes, 
and (3) the start symbol S is an Or-node. By extending previous studies [3l| , 
it can be shown that any context-free AOG can be converted into this form 
and both the time complexity of the conversion and the size of the new AOG is 
polynomial in the size of the original AOG. We give more details in |AppendTx| 
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Algorithm 1: Parsing with a stochastic context-free AOG 
Input: a data sample X consisting of a set of non-duplicate instances of terminal 
nodes, a stochastic context-free AOG G in Chomsky normal form 
Output: the probability p* of the most likely parse of X 
1: Create an empty map M /* M[i,O,0,T] stores the probability of a valid compo¬ 
sition of size i with root Or-node O, parameter 9, and set T of terminal instances. 
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2: for all x £ X do 

3: a 4— the terminal node that x is an instance of 

4: 9 4— the parameter of x 

5: for all Or-rule (O —» a, p) in G do 

6: M[l,O,0, {x}] 4-p 

7: for i = 2 to |X| do 

8: for j = 1 to i — 1 do 

9: for all (0i,0i,pi) : M\j,0 i,0i,Ti] = pi do 

10: for all (0 2 ,02,P2> : M[i - j,0 2 ,9 2 ,T 2 \ = p 2 do 

11: for all And-rule (A —» O 1 O 2 , t, f) in G do 

12: if t{ 8 \, # 2 ) = True and Ti (") T 2 = 0 then 

13: { 82 , 82 ) 

14: T 4— Ti [J T 2 

15: for all Or-rule (O —> A, po) in G do 

16: p 4- P0P1P2 

17: if M[i, O, <j>, T] is null then 

18: M[i,0,<p,T] <—p 

19: else 

20: M[i, O, (j>, T]<r- max{p, M[i, O, <j>, T]} 

21: return maxs M[\X\, S, 8 , X] /* S is the start symbol */ 


0 

The basic idea of Algorithm [l] is to discover valid compositions of terminal 
instances of increasing sizes, where the size of a composition is defined as the 
number of terminal instances it contains. Size 1 compositions are simply the 
terminal instances (line [2] [g]) . To discover compositions of size i > 1, the com¬ 
bination of any two compositions of sizes j and i — j {j < i) are considered (line 
[7-201. A complete parse of the data sample is a composition of size |A'| with 


its root being the start symbol S (line 
The time complexity of Algorithm 


lfis 0(|A| 2 c 2 |G|(|X| -f |G|)) where c = 


maxi | Gi| and G^ is the set of valid compositions of size i in the data sample 
X. In the worst case when all possible compositions of terminal instances from 
the data sample are valid, we have c = w hich is exponential in |A|. 

To make the algorithm tractable, we restrict the value of c with the following 
assumption on the input data sample. 


Composition Sparsity Assumption. For any data sample X and any posi¬ 
tive integer i < |X|, the number of valid compositions of size i in X is polynomial 
in |A|. 














This assumption is reasonable in many scenarios. For text data, for a sen¬ 
tence of length 77i, a valid composition is a substring of the sentence and the 
number of substrings of size i is m — i + 1. For image data, if we restrict the 
compositions to be rectangular image patches (as in the hierarchical space tiling 
model mi). then for an image of size to = n x n it is easy to show that the 
number of valid compositions of any specific size is no more than n 3 . 

3 Logic Perspective of Stochastic AOGs 

In a stochastic AOG, And-rules model the relations between terminal and non¬ 
terminal instances and Or-rules model the uncertainty in the compositional 
structure. By combining these two types of rules, stochastic AOGs can be seen 
as probabilistic models of relational structures and are hence related to the field 
of statistical relational learning mi- In this section, we manifest this connection 
by providing probabilistic logic interpretations of stochastic AOGs. By estab¬ 
lishing this connection, we hope to facilitate the exchange of ideas and results 
between the two previously separated research areas. 

3.1 Interpretation as Probabilistic Logic 

We first discuss an interpretation of stochastic context-free AOGs as a subset of 
first-order probabilistic logic with a possible-world semantics. The intuition is 
that we interpret terminal and nonterminal nodes of an AOG as unary relations, 
use binary relations to connect the instances of terminal and nonterminal nodes 
to form the parse tree, and use material implication to represent grammar rules. 

We first describe the syntax of our logic interpretation of stochastic context- 
free AOGs. There are two types of formulas in the logic: And-rules and Or-rules. 
Each And-rule takes the following form (for some n > 2). 


n 



A Rg{9{x), 9(y 1 ) 1 % 2 ), ■ • •, 9{y n )) 


The unary relation A corresponds to the left-hand side And-node of an And- 
rule in the AOG; each unary relation Bj corresponds to a child node of the 
And-rule. We require that for each unary relation A, there is at most one 
And-rule with A{x) as the left-hand side. The binary relation Ri is typically 
the HasPart relation between an object and one of its parts, but Ri could also 
denote any other binary relation such as the Agent relation between an action 
and its initiator, or the HasColor relation between an object and its color. 
Note that these binary relations make explicit the nature of the composition 
represented by each And-rule of the AOG. 9 is a function that maps an object 
to its parameter. Rg is a relation that combines the parameter relation and 
parameter function in the And-rule of the AOG and is typically factorized to 
the conjunction of a set of binary relations. 
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Each Or-rule takes the following form. 


Var, A(x) — » 2?(a:) : p 

The unary relation A corresponds to the left-hand side Or-node and B to the 
child node of an Or-rule in the AOG; p is the conditional probability of A(x) —» 
B(x) being true when the grounded left-hand side A(x) is true. We require that 
for each true grounding of A{x), among all the grounded Or-rules with A(x) 
as the left-hand side, exactly one is true. This requirement can be represented 
by two additional sets of constraint rules. First, Or-rules with the same left- 
hand side are mutually exclusive, i.e., for any two Or-rules Vx,A(x) —> Bi{x) 
and \/x,A(x) —» Bj(x), we have V x,A(x ) —> Bi(x) f Bj(x) where f is the 
Sheffer stroke. Second, given a true grounding of A(x), the Or-rules with A(x) 
as the left-hand side cannot be all false, i.e., \/x,A(x) —> \l i B i (x) where i 
ranges over all such Or-rules. Further, to simplify inference and avoid potential 
inconsistency in the logic, we require that the right-hand side unary relation 
B of an Or-rule cannot appear in the left-hand side of any Or-rule (i.e., the 
second requirement in the generalized Chomsky normal form of AOG described 
earlier). 

We can divide the set of unary relations into two categories: those that ap¬ 
pear in the left-hand side of rules (corresponding to the nonterminal nodes of 
the AOG) and those that do not (corresponding to the terminal nodes). The 
first category is further divided into two sub-categories depending on whether 
the unary relation appears in the left-hand side of And-rules or Or-rules (corre¬ 
sponding to the And-nodes and Or-nodes of the AOG respectively). We require 
these two sub-categories to be disjoint. There is also a unique unary relation S 
that does not appear in the right-hand side of any rule, which corresponds to 
the start symbol of the AOG. 

Now we describe the semantics of the logic. The interpretation of all the 
logical and non-logical symbols follows that of first-order logic. There are two 
types of objects in the universe of the logic: normal objects and parameters. 
There is a bisection between normal objects and parameters, and function 9 
maps a normal object to its corresponding parameter. A possible world is 
represented by a pair (X, L) where X is a set of objects and L is a set of literals 
that are true. We require that there exists exactly one normal object s £ X 
such that S(s) £ L. In order for all the deterministic formulas (i.e., all the 
And-rules and the two sets of constraint rules of all the Or-rules) to be satisfied, 
the possible world must contain a tree structure in which: 

1. each node denotes an object in X with the root node being s; 

2. each edge denotes a binary relation defined in some And-rule; 

3. for each leaf node x, there is exactly one terminal unary relation T such 
that T(x) € A; 

4. for each non-leaf node x, there is exactly one And-node unary relation A 
such that A(x) £ L , and for the child nodes {j/i, J/ 2 , • • • ,yn} of x in the 
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tree, {B i (y i )}V^ 1 U {Ri(x, yi)}? =1 U {Rg{0(x), %i), % 2 ), 0(y n ))} C L 
according to the And-rule associated with relation A ; 


5. for each node x, if for some Or-node unary relation A we have A{x) £ L, 
then among all the Or-rules with A as the left-hand side, there is exactly 
one Or-rule such that B(x) £ L where B is the right-hand side unary 
relation of the Or-rule, and for the rest of the Or-rules we have ~^B{x) £ L. 

We enforce the following additional requirements to ensure that the possible 
world contains no more and no less than the tree structure: 

1. No two nodes in the tree denote the same object. 

2. X and L contain only the objects and relations specified above. 

The probability of a possible world (X , L) is defined as follows. Denote by 
R° r the set of Or-rules. For each Or-rule r : V.t ,A(x) —>• B(x), denote by p r 
the conditional probability associated with r and define g r := {x £ X\A{x) £ 
L A B{x) £ L}. Then we have: 

P((X,L))= H Pr M 

reR° r 


In this logic interpretation, parsing corresponds to the inference problem of 
identifying the most likely possible world in which the terminal relations and 
parameters of the leaf nodes of the tree structure match the atomic patterns in 
the input data sample. Computing the marginal probability of a data sample 
corresponds to computing the probability summation of the possible worlds that 
match the data sample. 

Our logic interpretation of stochastic context-free AOGs resembles tractable 
Markov logic (TML) [III 152] in many aspects, even though the two have very dif¬ 
ferent motivations. Such similarity implies a deep connection between stochas¬ 
tic AOGs and TML and points to a potential research direction of investigating 
novel tractable statistical relational models by borrowing ideas from the stochas¬ 
tic grammar literature. There are a few minor differences between stochastic 
AOGs and TML, e.g., TML does not distinguish between And-nodes and Or- 
nodes, does not allow recursive rules, enforces that the right-hand side unary 
relation in each Or-rule is a sub-type of the left-hand side unary relation, and 
disallows a unary relation to appear in the right-hand side of more than one 
Or-rule. 

3.2 Interpretation as a Stochastic Logic Program 

Stochastic logic programs (SLP) [T2] are a type of statistical relational mod¬ 
els that, like stochastic context-free AOGs, are a generalization of stochastic 
context-free grammars. They are essentially equivalent to two other represen¬ 
tations, independent choice logic [33] and PRISM [34] . Here we show how 
a stochastic context-free AOG can be represented by a pure normalized SLP 
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[35]. Since several inference and learning algorithms have been developed for 
SLPs and PRISM, our reduction enables the application of these algorithms to 
stochastic AOGs. 

In our SLP program, we have one SLP clause for each And-rule and each 
Or-rule in the AOG. The overall structure is similar to the probabilistic logic 
interpretation discussed in section |3.1[ For each And-rule, the corresponding 
SLP clause takes the following form: 

1.0 :a(X,P) ^(Ap Pi), b 2 (X 2 , P 2 ), ■ ■ • ,b n (X n ,P n ), 
append{ [A,,..., X n ], X ), n (A, Xj .), r 2 (A, A 2 ), 

• • • , r n ( A, A„), r e {P, P u ..., P n ). 

The head a(A, P) represents the left-hand side And-node of the And-rule, where 
A represents the set of terminal instances generated from the And-node and P 
is the parameters of the And-node. In the body of the clause, bi represents the 
i-th child node of the And-rule, r* represents the relation between the And-node 
and its i- th child node, append (...) states that the terminal instance set A of 
the And-node is the union of the instance sets from all the child nodes, and 
rg represents a relation that combines the parameter relation and parameter 
function of the And-rule. For relations and rg, we need to have additional 
clauses to define them according to the type of data being modeled. 

For each Or-rule in the AOG, if the right-hand side is a nonterminal, then 
we have: 

p: a(A, P) :-b(X,P). 

where p is the conditional probability associated with the Or-rule, a and b 
represent the left-hand and right-hand sides of the Or-rule respectively, whose 
arguments A' and P have the same meaning as explained above. If the right- 
hand side of the Or-rule is a terminal, then we have: 

p-- o(M, [■■■])■ 

where t is the right-hand side terminal node and the second argument represents 
the parameters of the terminal node. 

Finally, the goal of the program is 

:- 8(X,P). 

which represents the start symbol of the AOG, whose arguments have the same 
meaning as explained above. 


4 Conclusion 

Stochastic And-Or grammars extend traditional stochastic grammars of lan¬ 
guage to model other types of data such as images and events. We have provided 
a unified representation framework of stochastic AOGs that can be instantiated 
for different data types. We have shown that many existing grammar formalisms 
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and probabilistic models in natural language processing, computer vision, and 
machine learning can all be seen as special cases of stochastic context-free AOGs. 
We have also proposed an inference algorithm for parsing data samples using 
stochastic context-free AOGs and shown that the algorithm is tractable un¬ 
der the composition sparsity assumption. In the second part of the paper, we 
have provided interpretations of stochastic context-free AOGs as a subset of 
first-order probabilistic logic and stochastic logic programs. Our interpreta¬ 
tions connect stochastic AOGs to the field of statistical relational learning and 
clarify their relation with a few existing statistical relational models. 
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Appendix A Related Models and Special Cases 

Appendix A.l Stochastic Context-Free Grammars 

Definition 1. A stochastic context-free grammar (SCFG) is a 4-tuple (£, N, S, R): 

• £ is a set of terminal symbols 

• N is a set of nonterminal symbols 

• S' is a special nonterminal called the start symbol 

• R is a set of production rules, each of the form A — > a [p] where A £ N, 
a £ (£[J IV)*, and p is the conditional probability P{a\A). 

Any SCFG can be converted into And-Or normal form as described in [F9]. 
The conversion results in a linear increase in the grammar size. 

Definition 2. An SCFG is in And-Or normal form iff. its nonterminal symbols 
are divided into two disjoint subsets: And-symbols and Or-symbols, such that: 

• each And-symbol appears on the left-hand side of exactly one production 
rule, and the right-hand side of the rule contains a sequence of two or 
more terminal or nonterminal symbols; 

• each Or-symbol appears on the left-hand side of one or more rules, each 
of which has a single terminal or nonterminal symbol on the right-hand 
side. 

Proposition 1. Any SCFG can be converted into And-Or normal form with 
linear increase in size. 

Proof. We construct a SCFG in And-Or normal form as follows. For each 
production rule A —> a [p] with two or more symbols in a, create an And- 
symbol B and replace the production rule with two new rules: A —> B [p] 
and B —> a [1.0]. Regard all the nonterminals in the original SCFG as Or- 
symbols. □ 

Proposition 2. Any SCFG can be represented by a stochastic context-free AOG 
with linear increase in size. 

Proof. We first convert the SCFG into And-Or normal form. We then construct 
an equivalent stochastic context-free AOG (£, N,S,9,R): 

• £ is the set of terminal symbols in the SCFG. 

• N is the set of nonterminal symbols in the SCFG, with a correspondence 
from And-symbols to And-nodes and from Or-symbols to Or-nodes. 

• S is the start symbol of the SCFG. 

• 0 maps a substring represented by a terminal or nonterminal symbol to 
its start/end positions in the complete sentence. 
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• R is constructed from the set of production rules in the And-Or normal 
form SCFG; each rule headed by an And-symbol becomes an And-rule, 
with its parameter relation specifies that the substrings represented by 
the child nodes must be adjacent (by checking their start/end positions) 
and its parameter function outputs the start/end positions of the concate¬ 
nated string represented by the parent And-node (i.e., the start position 
of the leftmost substring and the end position of the rightmost substring); 
each rule headed by an Or-symbol becomes an Or-rule with the same 
conditional probability. 

It is easy to verify that the size of the stochastic context-free AOG is linear in 
the size of the original SCFG. □ 

Appendix A.2 Linear Context-Free Rewriting Systems 

Linear context-free rewriting systems (LCFRS) [55] are a class of mildly context- 
sensitive grammars, which subsume as special cases a few other grammar for¬ 
malisms [20j [22 . 

Definition 3. A linear context-free rewriting system is a 4-tuple (E ,N,S,R): 

• E is a set of terminal symbols 

• N is a set of nonterminal symbols 

• S is a special nonterminal called the start symbol 

• R is a set of production rules, each of the form p : A[g(/3 1 ,... ,Pr)\ 
£?i[/3i],... ,B r [j3 r \ such that p is the conditional probability of the rule 
given A, A, Bi,... ,B r € N, fa € V (for * = 1,..., r) where f> : N —> 
N specifies the fan-out of a nonterminal symbol and V is a set of variables, 
and g : V^ Bl ^ x ... x V^ Br ^ —> ((V[J E)" 1 ")^" 4 ) is a composition function 
that is linear and regular , i.e., in the equation 

diPli ■ ■ * ) Pr) (L) * * * ) t<f>(A)) 

each variable in V appears at most once on each side of the equation and 
the two sides of the equation contain exactly the same set of variables. 

We can define And-Or normal form of LCFRS in a similar way as for SCFG. 

Definition 4. An LCFRS is in And-Or normal form iff. its nonterminal symbols 
are divided into two disjoint subsets: And-symbols and Or-symbols, such that: 

• each And-symbol appears on the left-hand side of exactly one production 
rule, and the number of nonterminal symbols on right-hand side of the 
rule plus the number of terminals inserted by the composition function is 
larger than or equal to two; 
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• each Or-symbol appears on the left-hand side of one or more rules, in each 
of which the number of nonterminal symbols on right-hand side plus the 
number of terminals inserted by the composition function is one. 

Proposition 3. Any LCFRS can be converted into And-Or normal form with 
linear increase in size. 

Proof. The conversion can be done in the same way as for SCFG. □ 

Proposition 4. Any LCFRS can be represented by a stochastic context-free 
AOG with linear increase in size. 

Proof. We first convert the LCFRS into And-Or normal form. We then con¬ 
struct an equivalent stochastic context-free AOG (E, N, S,0, R): 

• E is the set of terminal symbols in the LCFRS. 

• N is the set of nonterminal symbols in the LCFRS, with a correspondence 
from And-symbols to And-nodes and from Or-symbols to Or-nodes. 

• S is the start symbol of the LCFRS. 

• 0 maps a list of substrings represented by a terminal or nonterminal symbol 
to a list of start/end positions of these substrings in the complete sentence. 

• R is constructed from the set of production rules in the And-Or normal 
form LCFRS: 

— Each rule headed by an And-symbol becomes an And-rule, whose 
right-hand side includes all the right-hand side nonterminal symbols 
of the original rule as well as all the terminal symbols added by the 
composition function. Note that each of the substrings represented 
by the And-symbol is formed by the composition function by concate¬ 
nating terminals and/or substrings represented by the nonterminal 
symbols on the right-hand side of the rule. The parameter relation 
enforces that these component substrings are adjacent (by checking 
their start/end positions), and the parameter function outputs the 
start/end positions of the concatenated strings. 

— Each rule headed by an Or-symbol becomes an Or-rule with the 
same conditional probability, whose right-hand side contains the sin¬ 
gle right-hand side nonterminal symbol of the original rule or the 
single terminal symbol from the composition function. 

It is easy to verify that the size of the stochastic context-free AOG is linear in 
the size of the original LCFRS. □ 
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Appendix A.3 Constraint-based Grammar Formalisms 

Constraint-based grammar formalisms [8] associate feature structures to non¬ 
terminals and use them to specify constraints in the grammar rules. 

Definition 5. A feature structure is a set of attribute-value pairs. The value of 
an attribute is either an atomic symbol or another feature structure. A feature 
path in a feature structure is a list of attributes that leads to a particular value. 

Below is an example feature structure, and (Agreement Number) is a feature 
path leading to the atomic symbol value singular. 


Category 

Agreement 


NP 

Number 

Person 


singular 

third 


Definition 6. A constraint-based grammar formalism is a 4-tuple (E, TV, S, R): 

• E is a set of terminal symbols 

• TV is a set of nonterminal symbols 

• S' is a special nonterminal called the start symbol 

• R is a set of production rules, each of the form p : A —> a {C} where p is 
the conditional probability P(a\A), A £ TV, a € (E (J TV)*, and C is a set 
of feature constraints ; each nonterminal symbol in the rule is associated 
with a feature structure; each feature constraint takes the form of either 
“(A feature-path) = atomic-value” or “(A feature-path) = (Y feature- 
path)” , where A, Y are nonterminal symbols in the rule. 

Proposition 5. Any constraint-based grammar formalism can be represented 
with linear increase in size by a generalization of stochastic context-free AOG 
that allows an And-rule to have only one symbol on the right-hand side. 

Proof. We construct an equivalent stochastic context-free AOG (E, TV, £, 0, i?) 
in which we allow an And-rule to have only one symbol on the right-hand side: 

• E is the set of terminal symbols in the constraint-based grammar formal¬ 
ism. 

• For A, all the nonterminal symbols of the constraint-based grammar for¬ 
malism become Or-nodes, and for each production rule we create an And- 
node. 

• S is the start symbol of the constraint-based grammar formalism. 

• 9 maps a word represented by a terminal symbol to the start/end positions 
of the word in the complete sentence and maps a substring represented by 
a nonterminal symbol to a feature structure in addition to the start/end 
positions of the substring. 
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• R is constructed as follows. For each rule p : A —>• a {C} in the constraint- 
based grammar formalism, create one Or-rule p : A —>■ B and one And-rule 
B —>• a where B is a new And-node. Suppose C' is a copy of C with all 
the appearance of A changed to B. Then the parameter relation of the 
And-rule is the conjunction of the constraints in C' that does not involve 
B plus the constraint that the substrings represented by the child nodes 
must be adjacent (by checking their start/end positions); the parameter 
function outputs the start/end positions of the concatenated string as well 
as a new feature structure constructed according to the constraints in C' 
that involve B. 

It is easy to verify that the size of the stochastic context-free AOG is linear in 
the size of the original constraint-based grammar formalism. □ 

Appendix A.4 Sum-Product Networks 

Sum-product networks (SPN) [9] are a new type of deep probabilistic models 
that can be more compact than traditional graphical models. 

Definition 7. A sum-product network over random variables X\,X 2 ,... ,Xd is 
a rooted directed acyclic graph. Each leaf node is an indicator Xi or Xi. Each 
non-leaf node is either a sum node or a product node. A sum node computes a 
weighted sum of its child nodes. A product node computes the product of its 
child nodes. The value of an SPN is the value of its root node. The scope of a 
node is the set of variables appearing in its descendant leaf nodes. For an SPN 
to correctly compute the probability of all evidence, the children of any sum 
node must have identical scopes and the children of any product node cannot 
contain conflicting descendant leaf nodes (i.e., x % in one child and Xi in another). 

Definition 8. A decomposable SPN is an SPN in which the children of any 
product node have disjoint scopes. 

It has been shown that any SPN can be converted into a decomposable SPN 
with polynomial increase in size [36) . 

Proposition 6. Any decomposable SPN can be represented by a stochastic 
context-free AOG with linear increase in size. 

Proof. We construct an equivalent stochastic context-free AOG (£,N,S,8,R): 

• E is the set of leaf nodes (indicators) in the SPN. 

• N is the set of non-leaf nodes in the SPN, with a correspondence from 
product nodes to And-nodes and from sum nodes to Or-nodes. 

• S is the root node of the SPN. 

• 9 maps any node instance to null (i.e., we set all the instance parameters 
to null). 
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• R is constructed as follows: for each product node in the SPN, create an 
And-rule with the product node as the left-hand side and the set of child 
nodes as the right-hand side, let the parameter relation be always true, 
and let the parameter function always return null; for each child node of 
each sum node in the SPN, create an Or-rule with the sum node as the 
left-hand side, the child node as the right-hand side, and the normalized 
weight of the child node as the conditional probability. 

As shown in 136) . normalization of the child node weights of the sum nodes do 
not change the distribution modeled by the SPN. Therefore, for any assignment 
to the random variables, the marginal probability computed by the constructed 
stochastic context-free AOG and the probability computed by the original SPN 
are always equivalent. It is easy to verify that the size of the stochastic context- 
free AOG is linear in the size of the original SPN. □ 

Note that although SPNs are also general-purpose probabilistic models that 
can be used in modeling many types of data, stochastic AOGs go beyond SPNs 
in a few important aspects. Specifically, stochastic AOGs can simultaneously 
model data samples of different sizes, explicitly model relations, reuse grammar 
rules over different scopes, and allow recursive rules. These differences make 
stochastic AOGs better suited for certain domains and applications, e.g., to 
model recursion in language and translation invariance in computer vision. 


Appendix B Computational Complexity of In¬ 
ference 

We prove that the parsing problem of stochastic AOGs (i.e., given a data sample 
consisting of only terminal nodes, finding its most likely parse) is NP-hard. 

Theorem 1. The parsing problem of stochastic AOGs is NP-hard. 

Proof. Below we reduce 3SAT to the parsing problem. 

For a 3SAT CNF formula with n variables and k clauses, we construct a 
stochastic AOG of polynomial size in n and k. The node parameters in this 
AOG always take the value of null (i.e., no parameter), and accordingly in 
any And-rule of the AOG the parameter relation always returns true and the 
parameter function always returns null. For each variable Xi, create one Or- 
node Ai, two And-node X., and A'j, and two Or-rules A; —> Xi\ A,; with equal 
probabilities. Create an And-rule S —► {Ai, A 2 ,..., A n } where S is the start 
symbol. For each clause Cj , create an Or-node Bj, a terminal node Cj and two 
Or-rules Bj —► Cj\e with equal probabilities. Here e represents the empty set. 
For each literal l (which can be either Xi or xf for some *), suppose L is the 
corresponding And-node (i.e., A,; or Xf), if l appears in one or more clauses 
On j c h2 ,.. ., c hm , then create an And-rule L — » {Bf ll , B^ 2 ,..., Bf lrn }; otherwise 
create an And-rule L —> e. Note that the constructed AOG does not conform to 
the standard definition of AOG in that it contains the empty set symbol e and 
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that some And-rules may have only one child node. However, the constructed 
AOG can be converted to the standard form with at most polynomial increase 
in grammar size. See f3Tj for a list of CFG conversion approaches, which can be 
extended for AOGs. For simplicity in proof, we will still use the non-standard 
form of the constructed AOG below. 

We then construct a data sample which simply contains all the terminal 
nodes with no duplication: {C i, C 2 ,..., Ck}- 

We first prove that if the 3SAT formula is satisfiable, then the most likely 
parse of the data sample can be found (i.e., there exists at least one valid parse). 
Given a truth assignment that satisfies the 3SAT formula, we can construct a 
valid parse tree. First of all, the parse tree shall contain the start symbol and 
hence the production S —> {A\, A 2 ,..., A n }. For each variable x*, if it is true 
in the assignment, then the parse tree shall contain production Ai —> Xf, if it 
is false, then the parse tree shall contain production Ai —> Xi. For each clause 
Cj , select one of its literals that are true and suppose L is the corresponding 
And-node; then the parse tree shall contain productions L —» {. .., Bj, ...} —> 
{. .., Cj, ...}, where the first production is based on the And-rule headed by L 
and the second production is based on Or-rule Bj —» Cj. In this way, all the 
terminal nodes in the data sample are covered by the parse tree. Finally, for any 
Bj, node (for some k) in the parse tree that does not produce Ck , add production 
Bk —> e to the parse tree. The parse tree construction is now complete. 

Next, we prove that if the most likely parse of the data sample can be found, 
then the 3SAT formula is satisfiable. For each variable Xi, the parse tree must 
contain either production Ai —> Xi or production Ai —> Xi but not both. In 
the former case, we set Xi to true; in the latter case, we set it to false. We can 
show that this truth assignment satisfies the 3SAT formula. For each clause Cj 
in the formula, suppose in the parse tree the corresponding terminal node Cj 
is a descendant of And-node L (which can be Xi or Xi for some i). Let l be 
the literal corresponding to And-node L. According to the construction of the 
AOG, clause Cj must contain l. Based on our truth assignment specified above, l 
must be true and hence Cj is true. Therefore, the 3SAT formula is satisfied. □ 

Another inference problem of stochastic AOGs is to compute the marginal 
probability of a data sample. The proof above can be easily adapted to show 
that this problem is NP-lrard as well (with the same AOG construction, one 
can show that the 3SAT formula is satisfiable iff. the marginal probability is 
nonzero). 


Appendix C Conversion to Generalized Chom¬ 
sky Normal Form 

In our inference algorithm, we assume the input AOG is in a generalized version 
of Chomsky normal form, i.e., (1) each And-node has exactly two child nodes 
which must be Or-nodes, (2) the child nodes of Or-nodes must not be Or-nodes, 
and (3) the start symbol S is an Or-node. 
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By extending previous approaches for context-free grammars m , we can 
convert any AOG into this generalized Chomsky normal form with the following 
steps. Both the time complexity of the conversion and the size of the new AOG 
is polynomial in the size of the original AOG. 

1. (START) If the start symbol is an And-node, create a new Or-node as 
the start symbol that produces the original start symbol. 

2. (BIN) For any And-rule that contains more than two nodes on the right- 
hand side, replace the And-rule with a set of binary And-rules, i.e., convert 
A —> {xi, X 2 , ■ • •, x n } (n > 2) to Ai —► {xi, X 2 }, A 2 —> {Ai, X 3 },. .., A —> 
{A„_ 2 , x n }, where A,; are new And-nodes. We will discuss how to convert 
parameter relation and function later. 

3. (UNIT) For any Or-rule with an Or-node on the right-hand side, Oi —► 
O 2 , remove the Or-rule and for each Or-rule O 2 —> x create a new Or-rule 
Oi —> x (unless it already exists in the grammar). 

4. (ALT) If an And-rule contains an And-node or terminal node on the 
right-hand side, replace the node with a new Or-node that produces the 
node. 

In the BIN step, we have to binarize the parameter relation t and function 
/ along with the production rule, such that: 

1 , 6 ^,,,... , 6 b,J = f A {0 An _ 2 ,0 x J 

2 = iA n _ 2 (0A„_ 3 A n _i) 

0a 2 = f A2 (0 Al ,0 X3 ) 

9 A! = IaAOx 1 ,^ 2 ) 


and 


•) @X2 ) • • • 1 9 Xn ) & t A (0 Ari _ 2 ,6 Xri ) A t Ari _ 2 (6 An _ 3 ,9 Xri _ 1 ) 

A t A2 ( 0 Al , 0 X3 ) A t Al (9 Xl , @ X2 ) 

In some cases (e.g., the example AOG of line drawings in the main text), the 
parameter relation and function can be naturally factorized into this form. In 
general, however, we have to cache multiple parameters of the right-hand side 
nodes of the And-rule in the intermediate parameters 0 Al , 9 A2 ,..., 9 An _ 2 - 

6 U 1 = fA 1 {0 Xl ,0 X2 ) := (0 X1 ,0 X2 ) 

0A 2 f A 2 A\ 1 @X3 ) ' {^X\')^X2 1^X^) 

QAn -2 = f A n — 2 {@A n -3 5 0X n -1 ) * = iPxi •> 0X2 1 ' ' * 1 0x n -\) 
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then we define 


fA(9 An _ 2 ,0 Xn ) := f{0 Xl ,0 X2 ,... ,0 Xn ) 


and 


\ (^n ) @X 2 ) — ^A 2 {^A-\ j 9x3 ) — ’ ’ ’ — ^ A n _ 2 (9 An _3 i 9 x„_i ) •— T 

tA(9 An _ 2 , 9 Xn ) := t{9 Xl , 9 X2 ,..., 6 ^) 

Note that the sizes of the intermediate parameters can be polynomial in n. 
This actually violates the requirement that the parameter size shall be upper 
bounded by a constant. Nevertheless, when running our inference algorithm on 
the resulting Chomsky normal form AOG, the inference time complexity is only 
slightly affected, with the last factor (|X| + |G|) changed to a function poly¬ 
nomial in |A'| and |G|, and hence the condition for tractable inference remains 
unchanged. 
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